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In the design and execution rp^aj^esearch prpj^ct, an early step 

■ . : '-y:>- _ ■: ; ... ^ : : 4- . ; ^■.^vi^--__". __l _:X- t -' 

forward is taken when . one proposes the-Sath^ematical *model for how the 
observed data are the result .of ya set ; j^\t^p^es ized influences ;;" 
•fhis model will . state how each inf lueni&^ter^ ;^s? e^parameter (e.g. , 

•v • ; • V ^t^fr?-:? - ■ "-iK •: . ■ • v 

item difficulty, treatment effect) , ,hdw ^parameters j jiitexact -(e.g. , 
additive, multiplicative) ,^ and wh^tt conditions : 4fe ^assumed. A useful 



model provides a potential for predicting similar observations . 

- Models, then, serve as frames of reference for understanding the 
imseen, buk hypothesized;' forces ^ operating in the world around us. 

. ' . • - " V - - • - . - f - - H- - - ■ ' ' — 

i y usefxM, models are never perfect in any application, some 



experimental \^or is * inevitable. ^This' is due to our inability to 
completely control and measure the variables of interest i 

Consequently, the limits of models must.be tested continually as we 

.... r i ' . 4 ■ 

• ' •_ 7 ■ ... . » • . ■ ' • . ' " • ■ • 

search to reveal their generality. 
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• Since models can not serve as exact blueprints, there must be 
provided a mechanism that can assess how well models do function at 
least as rough outlines. In general the problem can be addressed by 
oh discrepancies in the observed data and their modelled 



Fag«S^2 



estimates^ The J s ehav'ipr of thes%^discrce]^ci es 



res^ciuals, can 



usually be determined for s itua€i6k§tt^ f it to 



extreme lack" of ; fit, 



Once a' . Hasl^^^&^^e'; distribution* of 



residuals is Established it may ..be • set of 



' residuals^ for iurprisingi unexpected v ; f^^S^^%^.^BS^cted . features 



might be traced back to, someVanombly in the origm^l^fta or to an 



. .-"to j 



oversight in the parameter izatioh of* the ' model - They^ alsc^ might not 



have a reasonable explanation. 



This simple preface on model building and model testing is 



relevant 'when one conducts a quantitative research synthesis. One 
purpose* for combining research studies is to estimate a population 

* treatment effect (Hedges, 1982a) . The internal validity of a model 

* ■ % . * ■■ - • 

for how effect size est imates^s^ild be computed and combined will 

feity of the effect sizd variation Effect size 



hinge upon the 

V ■ ' \ .. ' . * ■ %- _ ■ .. • "' . , ' " ;. 

variation may be^ssesed in the form of? a summary fit statistic, and a 

direct consideration of the extent of individual effect variation from 

the population estimate. This 'paper presents some diagnostic 

techniques that f acilitat^the analysis pf effect size variation-. ' 
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Effecir size estimates may reflect a variety of comparisons, ; e.g. 
means j cotielations , proportions (Cohen, 1977). ; For illustrative 



purposes we consider the case where the estimate repres^Rs the> \ 
standardized difference. between a pair" of. means. If we let the effect 
size estimate for a single stud^£>e 



where : 



i = 1,2, . . .",k studies 



^ = experimental group mean 
y« C = control group mean. 



with 5"' as the pooled, estimate of the .variance, where 



EQ. 2 



and = experimental group n; W = control 'group n, 
we can specify a weighted mean estimate of the , 
ion effect as •% 



3> 



EQ. 3' 



7 



where. 



The estimate* GDOT has variance 



EQ , : 5 . 
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. A 95% asymptotic confidence interval can be^ expressed as . .; >...•.' . . 

• The model for GDOT stsstufies that individual studies independent 
estimates of , a 'common population parameter , or in hypothes is testing 

terms: >» f> * .' / • "V . .•/ 

H*\ v* —0 .y . i = i i 2 9 . . . ^k. . . -y > 



A test of this hypothesis can be expressed as 



EQ: 6 , 



(The ijitereited reader will find the complete presentation of 
Eq. 1 thru Eq. 6 in Hedges , 1982a.) > ■ „> j / — 



If ^is true, then the test statistic H has an asymptotic 
chi-square distribution given by K^.iJz-J. If >H is nojt significant, 
then we accept /f<5. If H is s ignif icant , , then we reject the 
hypothesis that all ^! are equal. 

With the finding of a signij^tcaoj; H the analytic problem becomes 

'•■•/' ; . " ".' '''-A' v- v.- ... 

one/ of determining which effect estimates contributed to, -the lack of 

', ■ ■ ■ . ' . . . 

fit. Betweeji-group differences can be tested with the categorical or 

continuous model fitting techniques proposed by. Hedges (1982b, 1982c) , 

provided one has the information necessary , to specify the groups. 

Those techniques fielp explain a lack of fit .by revealing when effects. 

• ■*' ^ 

ate homogeneous within groups but heterogeneous between groups . , 
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|-; •:■;;.■/ ;;#■//.., V v''.:/: -.v : ' ;•;:/• J .Z^;/;- / 

t be tire case that 'classification variables ^are , not i 

■ ■ : • : • f . > : . •• • *: 

~J J - / ./ M i ~" ~ ■ : : •_ : T~ ~- . , _- • _ . •_-/< 

^avSwwieibr are not apparent:. In this situation the investigator may / * 

/ •/• /7; v .'v ■ j ■ • ■ I - . 1 . 7" 



• , . ^ f t ■. J ■ . '." » . . • . r_; - ' . 

/residuals from the" mode,! fitting process: \ ^ s 



7 * ;, " yishtr tsef ":: address 

^ -7 :: / ;! ;^v'- T '' •'• * : • • "v.7 v." •_ k ; : 7 . •" - : . - 

■'- ^'^ii^ls: /; ,are ' the; differences - between i^^yidual effects and the 

,'s ^d/ r / pikiri, 1984) . A typical 



population i estimate 



/'representation of an estimated residual and its standardized form is 

■ f/x Hi ' ' ' 



' : »■ ■//•//■.'-■; J ' -j ' ' - ■ ■ •' 



"."5 -iiZi 



EQ, 7 



:'/-, 




i'^fien the 'model holds, ^g^. has an expected value and variance of. 

• * ■_ . ■ . % :'. / . __ . 1 

This residual is computed as the difference between* the j'th effect f 

and the population estimate , . where the population estimate includes v j 
' 9 the j 1 tti effect,. ' * « .■ ■ r 

If. we want the difference between a particular: effect and ; the 
other v members of the sample, then we might be more interested in 
representing an estimated residual and its standardized form as 



er|c .= 



EQ. 8 



r /' J) ' :- * ' ; .- : , J- u """ : Pap 6 

where g^jmearis the population estimate does not contain the j r th/ : 
effect in tfie calculation. When the model holds, this residual-has. an 
expected value and variance of ; ■ ; 

Note that the- numerators and denominators in Eq. 7- and. Eq. '8 



differ. In Eq. 8 the ntimerator will reflect a larger discrepancy 

*-. * ■ ■ . , .■_ 

between a given effect and the population estimate than' will the 
difference computed with Eq. 7. The denominator, too, will be larger 
but the rate of change will be less thai that of the numerator. The 
overall result is that residuals computed under Eq. 8 are larger than 
their counterparts computed uhdeS Eq. 1 ., Table 1. illustrates the 
difference between 'the two alternatives. The SRES residuals are 
computed from the population ; estimate with all , studies included 
(GDOT) . The SRES J residuals are computed based on. the population 
estimate with the j'th study ■ removed. (GDOTJ) . The differences in; the 
pairs of standardized residuals (SRESJ-SRES) are listed in the DIFF 
'column^. Ih, ; each instance SRESJ is .more - extreme than' SRES i.e, 
.: | SRESJ] -. | SRES) =>0 . :: . It .is as-serted that these differences indicate 
that SRESJ is a more sensitive statistic than • . SRES for detecting 
heterogeneous - N variation. The remainder of this discussion addresses 
• standardized residuals computed according to Eq. 8. - • 
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; Regardless of the form of the residuaf the. observed sum of ; the 
"residuals is not likely to /equal zero. -Although^each residual has an > 



expected value of zero when the data: fit the model and, therefore, the 
expected value, of the sum of those residuals is zerq, ^here ^Ls^no^ 
algebraic requirement that estimated ires iduals 'must 45um~tcT zero when 
v they/ are computed relative .to a weighted estimator . ' ■ , ■ 

When an analysis of residuals is undertaken it i^< reasonable to 
decide first on an analytic^ approach that takes into account the 
number of studies under consideration. This is, because residuals from 
an analysis based on 10 or fewer studies do not normally require the 

techniques that . are u&eful when* 113 or more "studied are involved,, This 

\ .... f ■ ' , - ; v'.' . ■ ; * ^ .. - _. -\ . \ _ 

is a relevant , cons ider^t ion because sub-analyses of effect size data 

frequently involve fewer -and fewer studies. « For a small analysis it 

is usually sufficient to construct a table containing the original 

- estimates , their res iduals , the H statistic computed if that study • .: t 

■ ' ■ : - .. ■ •• . * /: • •;• . -• Z ;* 

were removed from 'the "analysis (HJ) , and the upper and lower 95% _ s ;* / 

" . ; - v _ . ■.. *...".••■ ' ■ • . . . *' 

confidence interval if that study' were removed. (It is relatively 

easy -during the initial pass through the da$a to compute the - ' 

second-step statistics that result when a given study is removed. ) r , ? 

' An example of a 'summary and diagnostic table is presented as 
Table 2. The observed homogeneity statistic (H=8.92, df=6, p>.10), . : ^ 
for the population estimate (GD0T=.326), is consistent with the • • 

hypothesis of homogeneous effects. The ' 95% confidence; interval (.230 
to '*. .423) does hot include zero. ' The iterated estimator of GDOT shows 
only, a s light improvement , as expected when the data 'fit the model 



■r 
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(Hedges ," 1982a, Eq. 14J. ; life the, diagnostic' statistics , section the 



residual for Study 3 (SRESJ=-2 . 0 1) . results from a relatively small 



effect estimate- (&=- .03) , - a fact: reflected in the* H; statistic computed 

. -V * ii- this - Vtudy taf e removed (a* dec line from .H=8.92 to 1 HJ=4. 87) . We \ 

.::■;";:>■•■-',-' ' ' T^<^> • . ■'. - .'■ v >',.-' . ' ' - .-" 

■>'■" -'filsb note that removal 'at this study would result in a rise in the 

/ estimate : of SB&t from .326 to .35 (GDOTJ). None of the individual 95% * 

confidence intervals include zero. \ie conclude that the individual /.: 

estimates fit \ tke model /and the pop^il^tion estimate is ^si^if icant*ly 

■■ .7" 1 greater than, zero . \; ; . : ? • : > v : ' /■./-•••'.'• 

y The .analytic situat^^ changes-^^h* p more t } 1 ^ a !Q 20, studies 

- ; ^Btk involved. The basic problem is /that it -is difficult "to scan long 
f . columns of values in any systematic manner . ; It V is, f- however, ^ 
relatively j easy to : /cpiistruct a -few simple bivariate :^lots to aid in 
detecting sbu^pfes of variation incons ist^nt ,fyVf£L.:, the\ mode 1 . • The \ 



■i if 



* purpose ; -of these 1 plots »is £o focus on the pon,tintiity an^t the range itf 
v ' ; ' . i the distribution o£ Me'^esixiuals . Tliat Is; : dto ^ the residuals fofm r^a! 

* nafrow, unbroiceri pattern or do they tend tp form flusters separated by ' 1 

• • • . r •• ■■ - V'-. r v : ■ ; r t ./\V /, •••i \^ • v,;, 

recognizable gaps with occasional outliers <• lying a .: cons iderabl js^v 

distance , from ther taaaSrftody?l ^ .v ; -' ":^ '- ^ 15/ r ' > A.i^'ky • » • \ : % • v v :" - 

^ ''^'tHS plots : presented in this^paper J do not adltess whelher A br not ? ... ;- ; / 
thi di?t^butf on of .residuals B fits a specific _fa^ ; 

the standard .t^ fb^^ assessing the J » ^ v 

■ ; , V;. • .' ." .-V- • {/ ^ •'■'>>•• ^' \ "''S-V^.f ' : '. • . ' tt -' 

' ; - 'staitisiicai / ^istrilluti^h a set^i' fesid^ ; ; ^ ^ 

' ' s : ... « ■ •. ' v//^;/^;-:; -.^ ^-^ ^ ' ^' t> ; ^ ^' ^' ^' 

^ but. they have yet fco ■ cattract* serxotia attention, i Perhaps v this, .£l*s - V. ; ^ 

because the analytic qu^tlfl g ^whictt ••thes^ :'teefifii<|ttes .address has no;t ... , r 



■ - ■ ■ * .i , • _ i 
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■J^i&WB&trffi&& : 'its - practical si^ificance;^ a qti^titatiVe syiitttesi? 
; context. , In p^rtj ,'this . 'is because* techniques such as "probability 
] plotting' have been ihbwn to Be Highly • vari^^S'/ wftefi relatively- few 
data pdints axe^lbtt^d (cf . Danifel and Wood, 1980, Appendix 3A) " >. 

The following discussion is based on data collected in 1983 by * 
, : Prof esaor George hillocks, ,Jr- , His purpose was to determine if \ 
/' ^nstruQtional strategies' lead- to a significant improvement in waiting 
composition^ skill^i arid if so, is : there a significant difference .. 

■ between; th4 ef fecit £ of 'the alternative strategies . A total of 39 
; y y studies "were included. They represent six instructional - strategics : 

grammer (k=5) , models (k=7), sentence combining (k=5 ) , scales (k=6) , 
^ and free -writing (k=10) . The fit of these data to the 

« model was inconsistent with the hypothesis of homogeneous effebl? Ksl^e 

viiiat ion (H=84 . 48 , df =38 K p< . 000 1) . An analysis of the residuals was 

•; • v. -• . • • r ... - ; Av : -vv - ■• . • 

■ ■■ ; .. • ' ■• • *. • • •' . . : : • . . /. : 

. . undertaken. . • < • ■ vk^- ?.#"' v 



7 



* . It is evident that the relation between-an— effect „jestim8te' ;vand 
its residual \ v is necessarily ^linear.- In . f act , the relation will be 
perfectly linear for G versus URES or G versus URESJ ;plotsy { This ; 



%£ ? v '^relation is • dembhst;¥.atpd • ija Figure 1 : Linearity mayj however be^ less^ ■ 
! - v ;:^tfi^ versus ;SI^S 9r ; f or G versus SRESJ- pl(3ts . : . > \v 

. • ••*•- "f ' •■ \'f>\-;-' "v. •] ''. ;• :!■.'".'■>•'-; ■/ . ■■ ■ - '■ ;'• *. ' ■ ] * : ^" i •• '. . '"V'."'- . • •. • •" V^' . '• '.: ^ • ..".^ * ' ' 

• i :? ^ iess S:han • perf ecit linear ie lsfci^bn ;.is possible \ because; v : ..^of '^tlfe^ : 

' xnf ifiit^nce : 6£ V sample • Viie in cbmpiiting* '-^he variance of an ; eff ect ^: 

• ; ' •" • ;•• • " ' - . 'k--/;. ... » \ • :. r - ... ^'^ 

7 V- , - £ WstMate. •'Mo^e^we±g^i^:;in ' the ;"' form . ..of' a ..smaller : . ertor:-: ^e^^^.:]^.^ 

* v attributed to studies with thl larger samples; thu^ , it is p6ssibie 

. f|r the^ore extreme of two * effect ^^imates .< t£ havfe the / smaller 

. * • • . -V. . : J"' . ' . . : r - . 4 ' • , / , -\ . Vi • * 

O , . .... , ' : " ^ f ! 9 ..... , < " • • , • -rf- ■■A.rvA.. 
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estimated residual. this means that: . a) # sole consideration of the 



. •,-,-,c^,, — . . V- — ■ - . 

largest and smallest effect "estimates is not surf icient, and b) the 

. ' preferred J^koice .of -residual will usually be,, one that has been 

. st^dar^ii^d; Obviously,* .if the sample sizes fre identical . the 

* ua^^d^dized. and standardized residual plots, will be -identical . : r : 

-3^? ^ glii^ of' § versus . SRESJ can .," be . interesting because the*' 

■% ;:Set?rogenebus : G estimates . wiil be* found %nV the tails of the 

--y/- ■ ' '< .. • ' :. : • ; • .; V; : -:. ) . 1 L •; 

I '>,>• 'distrribiitiom • Our attention will be drawn to gaps in the distribution 

;":'vV-or, dlusteis- in the ta^ls . . An example is provided jjx Figure 2- Two 

. feature^ are noteworthy. The first is the less than- perfect linear 

V ^relation' (though r=.94) . Study A has the most negative effect 

estimate (G=-. 27} but its residual (SRESJ=-2.32) is not as ■ extreme as' 

.the residual for study B (G=.05, SRESJ=-3 . 10) . This is because of the 

' diff erence' "in sample sizes . Study A contains samples of 41 and 36 

persons. - Study B consists of samples with 426 and 371 persons. The 

' difference between the study B estimate and the population estimate 

\ was accorded more weight than the difference between study A and the 

popuiatiofi because' study B was based on cpnsCderably larger samples^ .. 

. ifhe tails of the " dls tribut ion define tjie second interesting 
|fe^ture . ' The three.; studies ' in .-the, negative tail' represent 
1 free Writing 11 * strategies Two of the thr ^e ; studies ' in; ; the" pos it ive 
""'"o ?■ tail b"e long to the ^inquiry" cat egbi^ ♦ • / 
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< : ' .. "What cpM*equfettce6'^i^ ;"t&es.€5* \4xtr£me ; '*- .^t'imates' - .* : Kave updH the * \ 
: overall fit of • the ^ model? I na tm&l—ptgt to cot 

wbuld : b^ G ; versus HJ, ^jeach^ffect estimate plotted against.:, the, 
V ; homogeneity of" the sample i^t^at study were removed* The problem 
v ■ : with this particular plot, is tiiatVitv ^does not : fully Represent the 
adverse impact of a potentially heterag^^pjus; study This is because 
a plot -with 6 as an axis -does not take into account the samjple size. 

, :■: .v t: ; ; - "V- '/'_:"■! - /' ' \ :lv^v % \^^'-'-/'JJ~ iX^^ 

. .Thiis , it is^ the standardized residual (SRESJj 1 versus HJ plot* which is 
• of. interest for assessing the extent of- heterogeneity contributed by 
individual studies* • . ' ^ * i 

, . .. .' figure 3 illustrates the relation .betweeii the" standardized 

' iV**' . ::: : v . : __ J ; : - ■ : .]/ 

residuals and. the improvement *of fit to «tfie model o if their respective 

r : -studies are removed (SRESJ "versus HJ), The plot is necessarily 

quadratic because increasingly larger and smaller effects* diverge from 

the population estimate.. The interesting regions are, the extremes of . 

the curve where * either the curve extends for a substantial distance or 

gaps occur. . " ,' : 



- The two studies with the greatest over-estimate ( of. the" population 
| ^effect 'and" two of thfevstud^ of points .involve 

: " inquiry , ': strategies .. The four ; studies With the greatest 
under-estimate ,of . the. population effect involve ff f ree-wlriting" 
V; •. strategies;^ ... Tiles e findings supported ttie decision to group ^tid 
^ ^ instructional strategies , separately ' (see Hii locks , 
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■ -There can be a problem with this type of plot. Depending on how 
wide the plot boundaries are defined, a relatively slight difference 
can be trans formed, into a large gap.' -One could attempt to prpduce aj 
standardized graph by"* dividing each HJ by the degrees of freedom in 
the' analysis* This plot still looks quadratic but now the' _values tend 
to be identical if the data fit, or the values tend to deviate from 
the main cluster of points by only a slight margin^ An example is 
presented in Figure, 4 (SRESJ versus HJ/DF) ."'' How to, construct . a more , 
useful standardized plot remains to be discovered.. 

In conclusion, I emphasize that tie use of diagnostic techniques 
is not advocated for the ad hoc purpose of finding ar best-fitting 
subset of studies . Such a purpose is clearly meaningless for 
estimating a' population effect . The techniques aire , however, useful 
for revealing why a lack of fit' occurred. The issue : of whether ; 
individual studies should be removed from cons Iderat ion or should be 
formed into subsets for separate analysis must be based ,_on - 
methodojggical considerations: that • are consistent with the original 
criteria stated for including or excluding Studies from the original ,.- 
design/ That is, in the initial stages of a project it is possible 
that studies have been included that the investigator accepts: as. 
marginaTJy relevant but which are believed to be consistent with the 
studies of direct, interest. This* tactic is taken- occasionally when , 
one seeks to increase the number of studies in the analysis . . 'It may 
also be the case that the first test of the data will be to determine 
the extent ; of 'heterogeneity,' given that differences are assumed to 
exist but 'one wants to verify that is indeed ^he situation. If 
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specific studies or groups of studies do not fit the omnibus analysis 
and if there was some > a priori awareness that they might not , then 
#heii~ : ixctusion f rbm • cons idef atioh . or the instruction of a sublet 



: Abased da jdiipbslii^; ; ?M|^ does .,s^em warranted; 



a. 



r 



\ 
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( Y> Y? ) 



, j . l,2,...,k studies; Y 6 - experimental group mean 



(n^-l)(S«) 2 + t nj - iHSp 2 - 



( nj-t- n|M2 ) 



Uj. I "... / 



Y • control group mean 

(G is the effect estimate for study J) 



n. « experimental group n 
n^" control group n 



3 



- GDOT 



VARG 

k , 

- * ' VARG 



■■Si 1 



(GDOT is the weighted mean estimate 6f the 
population effect parameter) * 
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VARG 



c e C . •; 



VGDOT 



k 



1 

VARG 
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